10 research outputs found

    Term selection in information retrieval

    Get PDF
    Systems trained on linguistically annotated data achieve strong performance for many language processing tasks. This encourages the idea that annotations can improve any language processing task if applied in the right way. However, despite widespread acceptance and availability of highly accurate parsing software, it is not clear that ad hoc information retrieval (IR) techniques using annotated documents and requests consistently improve search performance compared to techniques that use no linguistic knowledge. In many cases, retrieval gains made using language processing components, such as part-of-speech tagging and head-dependent relations, are offset by significant negative effects. This results in a minimal positive, or even negative, overall impact for linguistically motivated approaches compared to approaches that do not use any syntactic or domain knowledge. In some cases, it may be that syntax does not reveal anything of practical importance about document relevance. Yet without a convincing explanation for why linguistic annotations fail in IR, the intuitive appeal of search systems that ‘understand’ text can result in the repeated application, and mis-application, of language processing to enhance search performance. This dissertation investigates whether linguistics can improve the selection of query terms by better modelling the alignment process between natural language requests and search queries. It is the most comprehensive work on the utility of linguistic methods in IR to date. Term selection in this work focuses on identification of informative query terms of 1-3 words that both represent the semantics of a request and discriminate between relevant and non-relevant documents. Approaches to word association are discussed with respect to linguistic principles, and evaluated with respect to semantic characterization and discriminative ability. Analysis is organised around three theories of language that emphasize different structures for the identification of terms: phrase structure theory, dependency theory and lexicalism. The structures identified by these theories play distinctive roles in the organisation of language. Evidence is presented regarding the value of different methods of word association based on these structures, and the effect of method and term combinations. Two highly effective, novel methods for the selection of terms from verbose queries are also proposed and evaluated. The first method focuses on the semantic phenomenon of ellipsis with a discriminative filter that leverages diverse text features. The second method exploits a term ranking algorithm, PhRank, that uses no linguistic information and relies on a network model of query context. The latter focuses queries so that 1-5 terms in an unweighted model achieve better retrieval effectiveness than weighted IR models that use up to 30 terms. In addition, unlike models that use a weighted distribution of terms or subqueries, the concise terms identified by PhRank are interpretable by users. Evaluation with newswire and web collections demonstrates that PhRank-based query reformulation significantly improves performance of verbose queries up to 14% compared to highly competitive IR models, and is at least as good for short, keyword queries with the same models. Results illustrate that linguistic processing may help with the selection of word associations but does not necessarily translate into improved IR performance. Statistical methods are necessary to overcome the limits of syntactic parsing and word adjacency measures for ad hoc IR. As a result, probabilistic frameworks that discover, and make use of, many forms of linguistic evidence may deliver small improvements in IR effectiveness, but methods that use simple features can be substantially more efficient and equally, or more, effective. Various explanations for this finding are suggested, including the probabilistic nature of grammatical categories, a lack of homomorphism between syntax and semantics, the impact of lexical relations, variability in collection data, and systemic effects in language systems

    Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

    Get PDF
    Techniques that compare short text segments using dependency paths (or simply, paths) appear in a wide range of automated language processing applications including question answering (QA). However, few models in ad hoc information retrieval (IR) use paths for document ranking due to the prohibitive cost of parsing a retrieval collection. In this paper, we introduce a flexible notion of paths that describe chains of words on a dependency path. These chains, or catenae, are readily applied in standard IR models. Informative catenae are selected using supervised machine learning with linguistically informed features and compared to both non-linguistic terms and catenae selected heuristically with filters derived from work on paths. Automatically selected catenae of 1-2 words deliver significant performance gains on three TREC collections.

    Mainstreaming adult ADHD into primary care in the UK: guidance, practice, and best practice recommendations

    No full text
    BACKGROUND: ADHD in adults is a common and debilitating neurodevelopmental mental health condition. Yet, diagnosis, clinical management and monitoring are frequently constrained by scarce resources, low capacity in specialist services and limited awareness or training in both primary and secondary care. As a result, many people with ADHD experience serious barriers in accessing the care they need. METHODS: Professionals across primary, secondary, and tertiary care met to discuss adult ADHD clinical care in the United Kingdom. Discussions identified constraints in service provision, and service delivery models with potential to improve healthcare access and delivery. The group aimed to provide a roadmap for improving access to ADHD treatment, identifying avenues for improving provision under current constraints, and innovating provision in the longer-term. National Institute for Health and Care Excellence (NICE) guidelines were used as a benchmark in discussions. RESULTS: The group identified three interrelated constraints. First, inconsistent interpretation of what constitutes a ‘specialist’ in the context of delivering ADHD care. Second, restriction of service delivery to limited capacity secondary or tertiary care services. Third, financial limitations or conflicts which reduce capacity and render transfer of care between healthcare sectors difficult. The group recommended the development of ADHD specialism within primary care, along with the transfer of routine and straightforward treatment monitoring to primary care services. Longer term, ADHD care pathways should be brought into line with those for other common mental health disorders, including treatment initiation by appropriately qualified clinicians in primary care, and referral to secondary mental health or tertiary services for more complex cases. Long-term plans in the NHS for more joined up and flexible provision, using a primary care network approach, could invest in developing shared ADHD specialist resources. CONCLUSIONS: The relegation of adult ADHD diagnosis, treatment and monitoring to specialist tertiary and secondary services is at odds with its high prevalence and chronic course. To enable the cost-effective and at-scale access to ADHD treatment that is needed, general adult mental health and primary care must be empowered to play a key role in the delivery of quality services for adults with ADHD

    Sensitivity of SARS-CoV-2 RNA polymerase chain reaction using a clinical and radiological reference standard: Clinical sensitivity of SARS-CoV-2 PCR.

    No full text
    ObjectivesDiagnostic tests for SARS-CoV-2 are important for epidemiology, clinical management, and infection control. Limitations of oro-nasopharyngeal real-time PCR sensitivity have been described based on comparisons of single tests with repeated sampling. We assessed SARS-CoV-2 PCR clinical sensitivity using a clinical and radiological reference standard.MethodsBetween March-May 2020, 2060 patients underwent thoracic imaging and SARS-CoV-2 PCR testing. Imaging was independently double- or triple-reported (if discordance) by blinded radiologists according to radiological criteria for COVID-19. We excluded asymptomatic patients and those with alternative diagnoses that could explain imaging findings. Associations with PCR-positivity were assessed with binomial logistic regression.Results901 patients had possible/probable imaging features and clinical symptoms of COVID-19 and 429 patients met the clinical and radiological reference case definition. SARS-CoV-2 PCR sensitivity was 68% (95% confidence interval 64-73), was highest 7-8 days after symptom onset (78% (68-88)) and was lower among current smokers (adjusted odds ratio 0.23 (0.12-0.42) pConclusionsIn patients with clinical and imaging features of COVID-19, PCR test sensitivity was 68%, and was lower among smokers; a finding that could explain observations of lower disease incidence and that warrants further validation. PCR tests should be interpreted considering imaging, symptom duration and smoking status

    How data science can advance mental health research

    Get PDF
    Accessibility of powerful computers and availability of so-called big data from a variety of sources means that data science approaches are becoming pervasive. However, their application in mental health research is often considered to be at an earlier stage than in other areas despite the complexity of mental health and illness making such a sophisticated approach particularly suitable. In this Perspective, we discuss current and potential applications of data science in mental health research using the UK Clinical Research Collaboration classification: underpinning research; aetiology; detection and diagnosis; treatment development; treatment evaluation; disease management; and health services research. We demonstrate that data science is already being widely applied in mental health research, but there is much more to be done now and in the future. The possibilities for data science in mental health research are substantia
    corecore